GeoName: A System For Back-Transliterating Pinyin Place Names

نویسندگان

  • Kui Lam Kwok
  • Qiang Deng
چکیده

To be unambiguous about a Chinese geographic name represented in English text as Pinyin, one needs to recover the name in Chinese characters. We present our approach to this back-transliteration problem based on processes such as bilingual geographic name lookup, name suggestion using place name character and pair frequencies, and confirmation via a collection of monolingual names or the WWW. Evaluation shows that about 48% to 72% of the correct names can be recovered as the top candidate, and 82% to 86% within top ten, depending on the processes employed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-Based Pinyin Name Resolution

For readers of English text who know some Chinese, Pinyin codes that spell out Chinese names are often ambiguous as to their original Chinese character representations if the names are new or not well known. For English-Chinese cross language retrieval, failure to accurately translate Pinyin names in a query to Chinese characters can lead to dismal retrieval effectiveness. This paper presents a...

متن کامل

A hybrid back-transliteration system for Japanese

Transliterating words and names from one language to another is a frequent and highly productive phenomenon. Transliteration is information loosing since important distinctions are not preserved in the process. Hence, automatically converting transliterated words back into their original form is a real challenge. In addition, due to its wide applicability in MT and CLIR, it is an interesting pr...

متن کامل

Improving Back-Transliteration by Combining Information Sources

Transliterating words and names from one language to another is a frequent and highly productive phenomenon. Transliteration is information loosing since important distinctions are not preserved in the process. Hence, automatically converting transliterated words back into their original form is a real challenge. However, due to wide applicability in MT and CLIR, it is a computationally interes...

متن کامل

Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation

Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic repre...

متن کامل

Extracting Transliteration Pairs from Comparable Corpora

Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese asキャッシュ “kyasshu”. In many cases, recent transliterations are not recorded in machine readable dictionaries so it is impossible to rely on dictionary lookup to find transliteration equivalents. In this paper we describe a meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003